Skip to content

Add OSS-Fuzz Atheris fuzzers for core serialization#60148

Open
skypher wants to merge 3 commits intoapache:mainfrom
rmc-infosec:ossfuzz-fuzzers-v2
Open

Add OSS-Fuzz Atheris fuzzers for core serialization#60148
skypher wants to merge 3 commits intoapache:mainfrom
rmc-infosec:ossfuzz-fuzzers-v2

Conversation

@skypher
Copy link

@skypher skypher commented Jan 6, 2026

Summary

Adds an upstream-owned OSS-Fuzz fuzzer suite under ossfuzz/.

Fuzz targets (Atheris):

  • DAG serialization/deserialization (serialized_dag_fuzz)
  • Connection URI parsing (connection_uri_fuzz)

Each fuzzer includes:

  • .options files with tuned input size limits
  • .dict files for structured input fuzzing
  • Small seed corpora under ossfuzz/seed_corpus/

Security Model Alignment

These fuzzers target code paths with clear security boundaries per Airflow's security model, avoiding the "DAG author trust zone" where DAG authors are expected to run arbitrary code.

Test plan

  • Tested locally with atheris (-max_total_time=10)
  • OSS-Fuzz integration build validation

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch 2 times, most recently from e6f5a77 to f8ddf18 Compare January 6, 2026 02:10
@jscheffl
Copy link
Contributor

jscheffl commented Jan 6, 2026

As we have a large repo, we should maybe put this in an existing subfolder, not top-level. I would propose moving all below the ci/ folder

@potiuk
Copy link
Member

potiuk commented Jan 6, 2026

Agreed: but we have scripts/ci folder :) - so maybe scripts/ossfuzz ?

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch 2 times, most recently from 67bbb5b to 5926f49 Compare January 6, 2026 13:41
@potiuk
Copy link
Member

potiuk commented Jan 6, 2026

Nice ! Now - just rebase and resolving conflict :)

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch from 5926f49 to fd8a690 Compare January 7, 2026 02:10
@skypher
Copy link
Author

skypher commented Jan 7, 2026

Nice ! Now - just rebase and resolving conflict :)

Awesome, I think it looks good now!

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

Hmm.. the problem we see now - is that atheris does not seem to be well prepared for our environment:

Attempting to install atheris in our image - which is our "gold standard" repeatable "works for me" environment that we use in our CI and when we want to reproduce what happens there locally - fails:

  1. Atheris seems like mostly C based library - that has thin python wrapper around - and it does not have Python 3.10 pre-compiled wheels - only 3.11- 3.13. And it fails when it's being build, because of clang compiler missing (and likely it would need some configuration of the image environment to fix it). We could likely overcome it by simply limiting the oss fuzzer to 3.11 - 3.13 though.

  2. But more importantlky - there are no ARM wheels. Which means that most of our developers will not be able to run it locally on their M1 macs. This is a bigger issue, because it means that if someone would like to reproduce it locally on Mac, they won't be able to do so - or it will be generally much more brittle - also if they will not use image, this is more likely to fail because their environment is not properly configured for compiling atheris.

I believe you are somewhat connected to Atheris @skypher -> maybe they can simply update their build and release process and produce the binary wheels for all the common platforms - including ARM and MacOS ? Or at the very list make sure that manylinux ARM wheels are available.

See https://pypi.org/project/atheris/3.0.0/#files - there are just three binary wheels, only for AMD

@skypher
Copy link
Author

skypher commented Jan 9, 2026

Atheris PR: google/atheris#99

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch from fd8a690 to 7378248 Compare January 9, 2026 09:46
@AidenRHall
Copy link

Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year.

I am curious what problem this PR is trying to solve? Having some CI tests sounds useful but why are we moving OSSFuzz tests into the fuzzing framework itself? There are a ton of atheris fuzzers in OSSFuzz, why is this one being moved specifically?

@skypher
Copy link
Author

skypher commented Jan 13, 2026

Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year.

I am curious what problem this PR is trying to solve? Having some CI tests sounds useful but why are we moving OSSFuzz tests into the fuzzing framework itself? There are a ton of atheris fuzzers in OSSFuzz, why is this one being moved specifically?

Hi @AidenRHall, thanks for the comment!

I think there may be a small misunderstanding - we're not moving anything out of OSS-Fuzz. Airflow doesn't currently have OSS-Fuzz integration at all.

This PR adds new fuzzing harnesses to the Airflow repository itself, following the standard pattern where projects maintain their fuzz targets in-tree. These aren't CI tests - they're fuzz targets intended for continuous fuzzing via OSS-Fuzz. The eventual goal would be to set up a projects/airflow/ configuration in OSS-Fuzz that points to these harnesses.

Does that clarify things?

@potiuk
Copy link
Member

potiuk commented Jan 16, 2026

And just to add @AidenRHall -> the idea here is that we would also like to experiment with more fuzzing ourselves in Airflow. We generally have approach that we do not add anything in our repo - even if it is going to be run externally by OSSFuzz - so that we can reproduce it locally easily.

We are starting small and we want to add more fuzzing in Airflow And @skypher was kind enough to propose the PR and adding PR that might be usable by OSSFuzz. However, if we are to make a good use of fuzzing and add it in various parts of Airflow, our contributors need to have an easy way of iterating on it - adding new fuzzing, modifying existing one - and this all should be locally runnable. Many of our contributors have Mac ARM devices they are developing Airflow on. Most PMC members and committers in fact. So if we are serious about fuzzing and about getting people involved in making good use of it - we need to make it easy for them to contribute to our fuzzing.

This is the main reason why we also try to use our CI to test it. While Python version is not a blocker (we can easily run it in CI only for Python 3.11+), lack of native ARM wheels is pretty much a blocker - taking into account the time it takes to build Atheris and the environment needed for build to succeed.

I hope that clarifies why ARM support is so important for us.

@potiuk
Copy link
Member

potiuk commented Jan 16, 2026

BTW. @skypher -> you can get the Python 3.10 failure go away by adding ; python_version >= "3.11" to the dependencies in the added pyproject.toml

@skypher skypher force-pushed the ossfuzz-fuzzers-v2 branch from a6df5aa to d3b6ca3 Compare January 16, 2026 03:13
@skypher
Copy link
Author

skypher commented Jan 16, 2026

BTW. @skypher -> you can get the Python 3.10 failure go away by adding ; python_version >= "3.11" to the dependencies in the added pyproject.toml

Updated, thanks a lot!

@skypher
Copy link
Author

skypher commented Jan 21, 2026

Hey folks, thanks so much for putting this together. We have dropped support for old Python versions because now that the bytecode is changing to much between Python versions it's not realistic to maintain so many versions with the relatively limited engineering cycles we have to devote to this project. You can use older Atheris versions for that however. Adding ARM / Apple Silicon support is something we have on our roadmap and we are hoping to get that up and running this year.

Hey again Aiden, just wondering if there's anything we can do to unblock our Atheris PR for these binaries. For your convenience, here's a copy of the link: google/atheris#99

Let us know please :-)

@skypher
Copy link
Author

skypher commented Feb 11, 2026

@AidenRHall any update on this? Thanks!

Adds base infrastructure for OSS-Fuzz fuzzing under `scripts/ossfuzz/`.

Includes:
- pyproject.toml with proper Python packaging (Private :: Do Not Upload)
- Dependencies on apache-airflow-core, apache-airflow-providers-standard, atheris
- Entry points for uv run support
- README documenting security model alignment and local testing
Fuzzer for DAG serialization/deserialization targeting
`DagSerialization.from_dict()`.

Used by Scheduler and API Server with schema validation.
Input comes from DAG parsing and caching.

Includes:
- `.options` with max_len tuning
- `.dict` for structured input fuzzing
- Seed corpus with minimal DAG JSON
Adds fuzzer for Connection URI parsing which is a security boundary
(API input validation).

Target: Connection._parse_from_uri() and sanitize_conn_id()

Includes dictionary, options file, and seed corpus.
@potiuk potiuk force-pushed the ossfuzz-fuzzers-v2 branch from d3b6ca3 to bb46319 Compare February 15, 2026 19:03
@potiuk
Copy link
Member

potiuk commented Feb 15, 2026

Would be great to get it in and try it :D

@skypher
Copy link
Author

skypher commented Feb 16, 2026

Would be great to get it in and try it :D

Glad to see your ping! What's needed to get it merged? Are we blocked on the Atheris issue or do you think we can proceed as-is? Doesn't seem like they're willing to get this in anytime soon.

@potiuk
Copy link
Member

potiuk commented Feb 17, 2026

Would be great to get it in and try it :D

Glad to see your ping! What's needed to get it merged? Are we blocked on the Atheris issue or do you think we can proceed as-is? Doesn't seem like they're willing to get this in anytime soon.

I guess if ARM is not supported, we can try it without. That will limit local testing, but well, tough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments